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<^7H1 7fl A] ^ M^o]^l- Oj-g-^1- «7l ofl^ ^^r, T^tt ^<>H 

^^-^1 ^ = 2fli ^tH}. ^ ^ »<HH1 ^«fl>M, 

S. 5 
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o^3li M5flol^# oj-g-tb ^-7] o\l^. {A BRANCH PREDICTION METHOD USING 
ADDRESS TRACE} 

£ l£r 4 >H]cfl^ *><>l=is. ^m 1 ^* S^; 
£ 2b^ ^^^1 Msfl^li iL<^7] ^tb 

S 3£: ^3)1-1- Ji<^7] ^tt 

£ 6^8: S. 3^1 ^^^s-ol ^. ^6)] Meflol^ ^o)] a))^^. 

22 : M5flol>i 220 : <H^3]^ S3M^ ?ti£| 
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<io> &7} cfl^ « o v^d] ^ ^ ^ ci o.3.fe Meflol^ 

» °l-8-fr £-71 ^1# "J^l 3rtr 3°14. 

<11> £ l^r 4 Afllfl^ * r °l3L3. o>7lQj^# a.<^^7l #7] 1997 

\i 9^, James E. Smith^r Sri ram Vajapeyam^] ^Sfl IEEE Computer, 68-74^ofl fe- 
tt Trace Processors : Moving to Fourth-Generation Microarchitectures'^! 5E. 1# % 
Str ^°l^f. ^*r^, (a)fe- 1940Hitfl<H| ^3 t^l^ ^^B^l *}-§-s}o} 

1960^31 2~7}v}x) ^l~g-^ *fl 1 ^]cfl c>ol3.s. o>7]e(^tl 31 =S4H (serial 
processor )<=>}^. °1 ^fw ^ AAQ ^^^i instruction)!- sfl 

^Kfetch)^. 4)^ (execute)^, je. 121 ( D H1 sj-oln^-o] o. A>-g-^ 2 Afl 

rfl n>o] 3S o>7lSi^^ 1980^ ^>^1 = 3>*fl^» ^ 5.^3. ^>-g-5l^^f. 

ZLSlJl £ 121 ( c ) ^ (dW 2.^1^ *)1 3 >*fltfl ^ *ll 4 *\W o^H)*^ ^ 

i^el- = 3.4*1 #( super scalar processors)^- *r-§-*r^ 7>^lcf. ojs-*-, 1980 

HA]^ «>sq- ^oi ; tf-g- ^itfl^ jt.^ l^l^t^-, 7fl7fl2l ^^zkl}. sj-oi^ei-oi c. 

3. iH^Hr, iH} ^ ^21 *)H» ^ ^ ^ cf^ ^5]^2l- 31}- 

olSBl-^ll-^r AV-g-^ ^olcf. ^A]oll cf^ 7fl2l ^o^-i: 7fl^ l-^ojiL ^}^}JL # 

SrolHEf-^l^ ^^(pipeline stall )$r ^-JlSHKt ?H1 <3l*fl o^l^fe 
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^3 z\z>}% ^711 ^t}. ZLB\B.S-, 3*Hb ^-71 aflsjsg- 4^^^ = 3>iH# fltfl 

SMH ^tb «r^r ^^H^ = 5.^H* *>7l ^tb ^ 7l#-gr 

(branch)* *ll<H*Rr ^^H^. ^l7> o^o^ ^o}*l* ofl^Hr HH^r ^ 7 }*1 

(bimodal branch predict ion)°l &cj-. ziam ^e^l^ £-71 ^liS.^ (branch 

history)* A>-g-gflAi ^s^b <^1^^4# Ji<^JL 9X^}, ^ ^ tb «<H^, zj-zj" 

^ ^-7] ^-^^^.S. JlS^^Jl ^H^o] sflli*^ o]^^- oj-g-^ «-7] 

«fl^(local branch prediction) ^BjJL, ^M-^ afl^Hr 

31 ^e^l 2.^ €-7l*S] §]^£EK combined history)* ^>-g-*>^ ^ €- 

7} ^(global branch prediction) «<^°14. A 7 A^\ ^ w o V ^#£r 

tb 7]-^lcf. ^1* #7] ^-71 ^7)7} S.^ « 0 >t§= 

.2.3. 7^>t)1 ^-fofl ^^>J1, *1<3 ^-71 ofl^ cj^tb ^ €: 

7]^1 ^^m. I2B]uL ^>7l *I<3 ^-71 ^ ^-&r ^r^S S-7l§<fl 
^5}^ #^1 ^£7> nfl^- ^ nfl ^ o_ 7^^. oj-oj]^ 

^ €-71 <^1#^ ^^-^r, 1993\1 6-i, Scott McFarlingoll ^ WRL Technical Note 
TN-36^1 1-19 ^ofl ^ ^ 'Combining Branch Predictors'^ ^7fl£l<H &t}. oj7H, 
WRL(Western Research Laboratory) 0 ! 1982\1 Digital Equipment Corporation 0 !! 
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3*fl SMH ofl# « 0 V^ ^oflS, 

£-7] wj-^nf *i<* ^-71 ofl^f w 0 >^^r ^tKb #« r $lS. ^-71 ^(combined branch 

prediction) ^ 5M. 
<13> ^-7]^ A>-g-*}^- g- 7 ] ^71 (branch predictor)^, ^ 

^ S£ 3A> ^7} 14-2.71 ^ofl o)&S>\ «. 7 | ^i^o] A>-g-^ 

^ ^-7] i-gigo^ M 3*} ^l^W. ^el^l, CPlKCentral Processing Unit) 

£r 3:^1 ^o\] ^ ^<H» *H*1*H ^^=r. ^ ^4, = 

e).oi o. o]*fl bb].s tgigo^ 3f)1^1 (fetch)* ^jSlS. eefls] CPIW1*| 

^l^^l^ ^l(stall)* ^ SJ^. ^M-, ^ ^-7) ofl^ 

3£l 4^- ^<HS ^«8*H<>> ^=r. -¥-^^-tb ^1^-8- ^"71 5M=5K]°1 ^ 

tr ^<H*S. ^1 *H$I€ *«*1 ^^1^711 *Rr ^^r. °1» <^JLS 

^tb ^-71 ^ll^Bl (mispredicted branch penalty)^ tb^f. 
<14> sj-oln^ol^- 7 >Xl^ J£^- ^ Afo]^o^ «-7l 5)1^ Ell- 7^ES, 

<« »^ 4-^1^1 ^^h} -MtIHI^ 87fl^l ^<>H1 tfltr 3f°lS5}- 
91°} n^^., Cl ^^^1-^1 ^o] ^AHl ^-7^ ^ ^71 *|) 

^El7> ^7>S»0 ^4. ^tiV^ SSZL^-g;^ 4 tfl*l 6 ^<H*H} ^7l» * r 7fl 3 

71 nfl^l, ^-^^-tr &7l efl^£. s>ol ^3] ^^5} Sj-ol^oj ^jofl 

^Efl^Cf. 

<15> ^7l£f ^7} ^l^El» #ol7l ^ Sfl ^ i^* 0 ! ol^^Jl & O.^ , 

^ S3H^ 7fl^ (trace cache)-!- ^>-§-*>^ Msflol^ ^5.4^7}- A>-g-sljL ^cf. 
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H3Mi James E. Smiths]- Sriram Vajapeyam<^l &m UrS^l ^7} 

Trace Processors : Moving to Fourth-Generation Microarchitectures' 0 !] 71]a]5]<>| oi 

£ 2a^ n 4=) (instruction cache ; 21) tJH 7]£r 

4\ ^3 *]€^(dynamic sequence)* J£<^7] JE^o^, j£ 2b^ °J«>^^1 Seflo] 

^ ?fl^(22)» ^^7l -^^r 3E.^o]cf. i 2a» ^-2:^, JE^ofl 5*1 € s^S-tt 
'taken' £-71(^-71 ^-^^1 <H^- 31 ^(branch target address)^. ^Ssfu ^-°-)#* M-E]- 

nfl AVol-a^cf #^£l^ rf^ ^-71 o]j<gr#2:*r£ 7]-g- lr^-# 'ABCDE' ^-IHH^ 

^ 7fl$l# ^l°J:«fl 7fl4=l^, S. 2b^l £*]€ w><4 

^r°l er^l°l ^3 — (dynamic instruction stream)^ ^(snapshot) 

JEE^r e efl °1 i(trace)» ^^"W. , °]$r ^ ^-s* 7}^b 7fl4]-I- HelH^ 

7fl^(22)Bf ^-t}. #7l e^o]^ ^4=1(22)^, 1994\i 6^ J. Johnson*!] ^«fl Technical 
Report CSL-TR-94-630, Computer Science Laboratory, Stanford Univ.°fl ' 
Expansion Caches for Superscalar Processors'; 1995\i 1-H A. Peleg^r U. ffeiser 0 !] £] 
tf\ U. S. Pat. No. 5,381,533, 'Dynamic flow instruction cache memory 

organized around trace segments independant of virtual address line'; zle].il 1996M 

12-t) E. Rotenberg, S. Bennett ^ J. Smith°H 5l«fl Proc. 29th Int ' 1 Symp. 
Microarchitecture^ 24-34^°!] Trace cache : A Low Latency Approach to 
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High Bandwidth Instruction Fetching' 6 !! ^l^l 0 ! 

<is> ol $]o\]^ eeflol^ 7fl4=l(22)^, 1998M 6^ Sanjay Jeram Patel , Marius 

Evers ^ Yale N. Patt°fl ^^fl^i Proc. 25th Inter' 1 Symp. Computer Architecture^ 
262-271^°11 ^S.^. 'Improving Trace Cache Effectiveness with Branch Promotion 
and Trace Packing'; 1999\i 2^3 Sanjay Jeram Patel, Daniel Holmes Friendly ^ Yale 
N. Pat 1 41 e)f$\*\ IEEE TRANSACTIONS ON COMPUTER, VOL. 48, NO. 2^1 193-204^] 

'Evaluation of Design Options for the Trace Cache Fetch Mechanism'; n.£\5L 
1999\i 2-1 Eric Rotenberg, Steve Bennett ^ James E. Smith^l ^«fl^i IEEE 
TRANSACTIONS ON COMPUTER, VOL. 48, NO. 2^ 111-120^^1 ^ 'A Trace Cache 

Microarchitecture and Evaluation'^ 7flA]s](>| 5$t}-. 

<19> <£6\}x\ ^^tb w>^r 5. 2a«^l S.*]& ^*ge>l 71)43(21) vflofl M- 

^ 1:^-^4 ^3 *l€^r, £ 2b ^1 J=3H^ n$\(22) iflofl <3^r3 °- 

^^7> ^^sfl^S. ^30} «-7l# ^^^1 ^Jl£ E^oji 7fl£|(22Hl 

^^fr8r ^>^-2.S ^« ^ U*BM, 71 ^-71 ^SflSl^ 

^-71 ^^Hlf « 0 M^- ^ o^e}-, t^^cH 7fl^|(21)5l Ir^^o] ^^o)l 

Sl^r M3H^ ^fl^ (22) iflcfl <£^3-2-5. ^^o.S^ i4 ^1^.^ ^1 ^ 

<20> ZLCluf #7l<2)- ^ M£)H^ 7fl4=l(22)^, ^^<H *r*ll» ^ ^>7l nfl^ofl o) ^ ^ 

ofl tfl-g-S)^- O^ri i^^o} t\=3.r% 4^°1 ^-S-^Cf. ZLClJl, >#7l<4 ^ J=2flo}^ 
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^^S* *WKr Se)H^ n$\{22)7\ ^*§<H# ^#§- 37l» # 

711^(22)^ ^H^efli *l*hi- ^ XI 31, -g-^* *}-g-*H 3 

a>o]^ ^ ^- 7 ># ^ $X±= ^7} <4|^ «<HH A^^rf. 

<2i> rcj-ej-^ , ^ sl^^. #^*v ^tiV -g-*)!^ «fl^s->7l ^Sfl m 

^7>» #^ ^ ^-71 afl# «<Hj-g; *)l^RrHl ^rf. 

<22> #^*V v}<% ^ ^- ^*>7l -^H] 

£r ^-f ^-Hlol A^S}^ <^H.5l)i, #7] ^floj ^ILi]^ <H^3l^, ^7l 

^Ml^ ^Tfl c^-fli ^ ^ #71 ^^1^ ^» 7>£MS}o} *l#*Kr #7fl 

<23> #71 J=3H^ 7fl3^ ( *V^£)^r ^^<>ltS °]^°)& ^71 ^f§S\^ ^-f, 
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<24> «-7l cfl^ ^"71 ^-ZE n^-Eil-^ ^S^T Wjiil^H , 

^"71 ^ 7jl^7> *]S. £Sr nfl ^ ^ t^-g-ofl A]*} o^S. iSfl^ o]^) 

^}*Kr S^t}. ZielJl, ^71 c^l^o] ^o. ^. 7 ] J=Lir ^^1- 

*fl^*Rr ^"71 if- = ^Ei^ ^tflolE £ ijLxr 

<25> (>^AH) 

<26> o]-s\ -g- tcj-^ ^aHI^- £^ £ 3 tfl^l £ 6# #S*>^ ^Ml^l ^ 

<27> Al^-^V E^]o]i 7fl4=l^, tfl-g-S]^ ^JELEfli EE|lo]A 7>^|# 

<28> £ 3^- «>45)^ Tg^ol Sfllioj ^.g. H^^7l ^tt i^^Cf. 

<29> S. 3<=>fl S.*]^ -f^ ■©•§- ^S*}^, A ^ B7> 3051 w>^-«fl^ 

€4. ^ ^ ©3 #^.5]^, (2)7} ^3^1, ^ <2HHfe <g 

C, D ^ ^ E7> 205] i*Hf- ^r^^cf. ^e|:n., o]<4 ^ @ 

3 ^fi^ig, ^ F ^ ^ G7> 402] ^ ^ @*1 
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MslHi 7fl4l (22W ^^^*^r 5- 42}- Qt\. 

<3i> ic 4-1- , ^2fl^ Meflol^ 7fl 41(22)^, ^^tt ^^o^s-o] 1^30.3. 

21- ^^-i: T^^r}-. >y-7l M3H^ 7H4](22)^, ^ 2}^ 607fl2l ^ 

^°K^<H 27fl X30S1 ^ = 60)* *i#S}7l 3^ dlolEl <$<$3%, ^ ©♦ 3 

607fl2l ^^<^(^^^ 37B X2051 ^ = 60)* *1#*}7l 2l*V ^ <8^, ZL 

BlJl ^-H ®# fltb 807)121 ^oU^^ 27fl X40S1 = 80)« ^^7l n 

<3<3<>1 zj-zj- iL^lt}.. ^, £ 3^1 © -^H ®* ^^>7l 

$l*M*r; # 20071151 ^olE-1 c$*\o] ^^tf. o] T^^ols.^- 

*Kr*f| 32 bit S 7> ^H^uf, ^"71 « © lfl*l ^ (3)^- X-l^>7l ^lH^r 

6,400 bits(^, 800 Bytes)2l tiHi~} ^ «3«*ol sWI^tf. 

<32> £. 5^ ^tr ^H.^ HelH^ 7114=1(220)21 iL<^7l JE^ 

°1^-. ^^l ^ ^H-Bell^ MelMi 7ll4l(220)^-, £ 5<H1 3E*1^ «>2f ^-oi t zj-zj- 
21 o^sfli^- x\ ^-&}7] iEl-M olc5]]^( start address )2f, zj-zj- 

21 ^ssq^- <>|H.ellil- M-El-^71 ^ ojc ol^efl^Cend address), «fl^ ^-H 

21 ^7fl <^M)+ 7}^M3rl-7l ^ ^e}M <$M]^ ^^1 (current access 

loop counter), nejJl #7] ^21 ^l <$M]^ ^» M-Er^M fl^ 3.= o^,*. ^ 
S. ?HrEl(old access loop counter )£. ^€4. 

<33> oil* «-<H, -£ 5oll 5=a1€ ^ ©oflA-1 ^5}^ ^<H*^ ^ ^iL» €• ^ 

<Hl 21 *V ocelli Meflol^ 71141(220)5. M"^^ ©2) oi^eH 

^ ^ <a^. <H^-el]^7r ^"71 Seflol^ 7fl4l(220Hl ^^cf. ol^^ t ^1 TlfiM ^^1 
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i *T-5L 7>£-BH^r if-H ®^ ^T-fl <^*\) + %^7} Z\#£\3L, ^-5. 

ofl^ ^ ®3 ^afl 334K<« 302] )7} ^^4. ifLTg Qo} ^4^7V a> 

trfe}- av 7 } j^h ^o] ^7V$>7fl 2^3], n>^= ^ 

— ^Ml^ = ^o] °^1^ ^-51 7f£E^ &4 ^Tfl J^l CM ^ 

^b}-e o]^)^ 7 } NFP(next fetch point AQQA. 
<34> S. 3 ^ S. 5°fl 3E^1^ w><2f ^o], ^ojo^ ^5]^. ^-fl Q ifl*] ^h] (3)^, °H1 

^ ti><4 « 0 V^ofl °1*fl ^-71 e^o]^ 7fl^(220)ofl zj-zj- 

$1^. =L^, ^7) ^ ® tfl^l ^ £o] i^^M ^^sl^l ^ ^-f 

<35> 5. 6-& £ 3°fl t^^^s-oI g-Tgofl o}^- ocelli SelH^ 71)^(220)^1 

<36> £ 6-i- 5. 3^1 ^-E^ ®^ ^d\] o]^ M 

7fl£|(220H *WRr ^ ®^ iE)-M o^^elliS.^ ^^1 ^5}^ ^ 

A^l <>1H^7} ^^5)^1, ^ ®^ ^^Su 2^1 -g^Tr B 

^= ^Rr^Tr 30-9.S. ^R^Jl, -f-H ®o] uflp}!} TlflM «}xfl 

tflt!: ^-£7} >a-7l = 7fl^(220)<^l 4^ A^^A. 
<37> £ 6^11 iEA]^ §01, ^. ^ofl o^f}- ol = 511^ 7114=1(220)^ ^-^°1 ^1 
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7fl^(220)S. 5. 3^1 ^71 -f^ © ^ ®# ^^>7l flSlHfe- % 

12711^ cflolBl <%<*o] A^cf. ol ^.o j zj-z^ T^eKrt-ll 32 bits7> 4i 

-S.^^, ^7} -f-^ © ifl^] if-El ^^*>7l ^H*1H^ # 384 bits(^, 48 Bytes) 

^ ^ol ^ I6.7tifl 7>%= #<H€- 

<38> olSf ^ Jl-^, ti>^-£l^r #6] s.^^^ > ZL^Jl ^^^f2] 

SH^^r ^^7} tH^- T^t}. ^o], ^ofl 01^.511^ MBflol^ 

7114=1(220)^ ^2fl3 JlelH^ Tfl^ofl cflolBl ^-g- A>-g-^> 7 l 

i ^eflo]i 711^(220)^- *3^°H ^ o]]=$)£l Meflol^ ^11- $ 

eflS ^^>M, ^<H^ ^^el]^ ^12^ Al^ ^ ^cf. 

<39> O]^^, ^ofl ^ ^ ^71^- -g^ ^ JE^l 1451- S. 

°l*r 3H1 €■ 7l#^ AHM- $^14*1 # 

JL4] 

<40> o]^-s\. Z>£- ig-Tgofl o)*>^, ^^C]6\] tfl-§-S]ir ol£?fli Eeflol^ ^11- tliil 
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11 

^Sl^ o]^o]^[ JfL^o] ^6gs]^ ^-f , ^^^^ ^ 

3]^, ^s]^ <HH3]^, #7] ^-^^ ^7fl 53 ^ ^-71 

= JE=e)H^# ol-g-fV ^7] o^l^ 
[^^8- 2] 

*fi l ^1 91°]*], 

3] 

#7] £-71 «<Hj£- ( 
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<#7] ^S. f}^B\^o\] <i\n alULs}^, ^7) ^ n^rA ^3- ^ 

[3^* 4] 

*H 2 



21-17 



1019990050627 



2000/3/2 



[51 1] 



(a) 



(b) 



(c) 




21-18 



1019990050627 



2000/3/2 



2bl 

22 



A 


B c 


D 


E 


15- 3] 

for (i=1 ; i<30 


; i++) 










A ; 
£-£4 b ; 








K 
r 


tc VL> 


for (j=1 ; j<20 




J 

"N 








g£f c ; 
<£>4 d ; 
E ; 








> 


«!<D 


for 


(k=1 ; k<40 


; k++) 


J 

*s 








££r F ; 
G ; 






J 


> 





21-19 



1019990050627 



2000/3/2 



[3E. 4] 



22 



■El ©— 



U <2)— 



A 


B 


A 


B 


A 


B 


A 


B 


A 


B 


A 


B 


A 


B 


A 


B 


A 


B 


A 


B 


A 


B 


A 


B 


A 


B 


A 


B 


A 


B 


A 


B 


A 


B 


A 


B 


A 


B 


A 


B 


C 


D 


E 


C 


D 


E 


C 


D 


^ 

t 


C 


D 


r~ 

c 


r> 
V/ 


D 


E 


C 


D 


E 


C 


D 


E 


C 


D 


E 


C 


D 


E 


C 


D 


E 


C 


D 


E 


C 


D 


E 


C 


D 


E 


C 


• • 

• • 

• • 


c 


D 


E 


C 


D 


E 


F 


G 


F 


G 


F 


G 


F 


G 


F 


G 


F 


G 


F 


G 


F 


G 


F 


G 


F 


G 


F 


G 


F 


G 


F 


G 


• ♦ 

• • 

• • 


F 


G 


F 


G 


F 




G 


F 


G 


F 


G 


F 


G 


F 


G 




NFP 



NFP- 



220 

S 



Start 


End 


Current Access 


Old Access 




NFP 


Address 


Address 


Loop Counter 


Loop Counter 


• • • 



21-20 



1019990050627 



2000/3/2 



[51 6] 



220 



A 


B 


i 


30 


Add. 


Add. 





c 

Add. 



E 
Add. 



20 



F 
Add. 



G 
Add. 



40 



■E!(D 



21-21 



